Document Selection Using Mapreduce

نویسندگان
چکیده

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Ontology Based Document Clustering Using MapReduce

Nowadays, document clustering is considered as a data intensive task due to the dramatic, fast increase in the number of available documents. Nevertheless, the features that represent those documents are also too large. The most common method for representing documents is the vector space model, which represents document features as a bag of words and does not represent semantic relations betwe...

متن کامل

Feature Selection in High-Dimensional Dataset Using MapReduce

This paper describes a distributed MapReduce implementation of the minimum Redundancy Maximum Relevance algorithm, a popular feature selection method in bioinformatics and network inference problems. The proposed approach handles both tall/narrow and wide/short datasets. We further provide an open source implementation based on Hadoop/Spark, and illustrate its scalability on datasets involving ...

متن کامل

Web Document Clustering Using Threshold Selection Partitioning

Clustering techniques have been applied to categorize documents on World Wide Web. In previous research, PDDP (Principal Direction Divisive Partitioning) is a well-known clustering algorithm. PDDP algorithm employs top-down and unsupervised clustering based on the principal component analysis and splits documents into two sets using a plane perpendicular to the maximum principal direction passi...

متن کامل

A MapReduce Relational-Database Index-Selection Tool

The physical design of data storage is a critical administrative task for optimizing system performance. Selecting indices properly is a fundamental aspect of the system design. Index selection optimization has been widely studied in DataBase Management Systems (DBMSs). However, current DBMS are not appropriate platforms for many data nowadays. As a result, several systems have been developed t...

متن کامل

Design and Implement of Distributed Document Clustering Based on MapReduce

In this paper, we describe how document clustering for large collection can be efficiently implemented with MapReduce. Hadoop implementation provides a convenient and flexible framework for distributed computing on a cluster of commodity machines. The design and implementation of tfidf and K-Means algorithm on MapReduce is presented. More importantly, we improved the efficiency and effectivenes...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: International Journal of Security, Privacy and Trust Management

سال: 2015

ISSN: 2319-4103,2277-5498

DOI: 10.5121/ijsptm.2015.4401